[SPARK-13019][Docs] Replace example code in mllib-statistics.md using include_example #11108

keypointt · 2016-02-07T07:25:30Z

https://issues.apache.org/jira/browse/SPARK-13019

The example code in the user guide is embedded in the markdown and hence it is not easy to test. It would be nice to automatically test them. This JIRA is to discuss options to automate example code testing and see what we can do in Spark 1.6.

Goal is to move actual example code to spark/examples and test compilation in Jenkins builds. Then in the markdown, we can reference part of the code to show in the user guide. This requires adding a Jekyll tag that is similar to https://github.com/jekyll/jekyll/blob/master/lib/jekyll/tags/include.rb, e.g., called include_example.
{% include_example scala/org/apache/spark/examples/mllib/SummaryStatisticsExample.scala %}
Jekyll will find examples/src/main/scala/org/apache/spark/examples/mllib/SummaryStatisticsExample.scala and pick code blocks marked "example" and replace code block in
{% highlight %}
in the markdown.

See more sub-tasks in parent ticket: https://issues.apache.org/jira/browse/SPARK-11337

…orking

mengxr · 2016-02-11T23:28:48Z

ok to test

mengxr · 2016-02-11T23:28:53Z

cc @yinxusen

SparkQA · 2016-02-11T23:38:25Z

Test build #51145 has finished for PR 11108 at commit a4dd0fb.

This patch fails Python style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-02-16T08:38:34Z

Test build #51352 has finished for PR 11108 at commit 0df3e65.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-02-20T19:35:54Z

Test build #51602 has finished for PR 11108 at commit d817d0b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-02-21T22:34:32Z

Test build #51646 has finished for PR 11108 at commit f945222.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-02-23T21:45:40Z

Test build #51800 has finished for PR 11108 at commit aec10ca.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yinxusen · 2016-03-06T05:40:43Z

examples/src/main/scala/org/apache/spark/examples/mllib/SummaryStatisticsExample.scala

+
+object SummaryStatisticsExample {
+
+  def main(args: Array[String]) {


def main(args: Array[String]):Unit = {

mengxr · 2016-03-16T23:37:36Z

examples/src/main/python/mllib/hypothesis_testing_kolmogorov_smirnov_test_example.py

+# $example off$
+
+if __name__ == "__main__":
+    sc = SparkContext(appName="HypothesisTestingKolmogorovSmirnovTestExample")  # SparkContext


remove comment

mengxr · 2016-03-16T23:46:46Z

@keypointt @yinxusen I made one pass and left some comments inline. One issue is whether we want to print empty lines. It doesn't seem necessary to me for example code appearing in the user guide. There are also some redundant comments, which we should remove. But overall, this look great!

yinxusen · 2016-03-17T01:15:16Z

@mengxr If there is no empty line, we will get a bulk of results which are hard to tell and recognize. I think it's a good experience for users that they can use ./bin/run-example ml.xxxExample to see some tidy outputs directly.

mengxr · 2016-03-17T18:03:08Z

I agree. But we don't really need lines just to print empty lines. For example:

println(goodnessOfFitTestResult)
println()

could be simply

println(s"$goodnessOfFitTestResult\n")

It is good to keep the example code short.

yinxusen · 2016-03-17T19:12:55Z

Sure, that makes sense.

Sent from my iPhone

On Mar 17, 2016, at 11:04, Xiangrui Meng [email protected] wrote:

I agree. But we don't really need lines just to print empty lines. For example:

println(goodnessOfFitTestResult)
println()
could be simply

println(s"$goodnessOfFitTestResult\n")
It is good to keep the example code short.

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

SparkQA · 2016-03-17T19:36:14Z

Test build #53460 has finished for PR 11108 at commit a4eb28d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

keypointt · 2016-03-21T18:06:07Z

Hi @mengxr how do you like this change?

mengxr · 2016-03-21T23:09:55Z

LGTM. Merged into master. Thanks!

keypointt · 2016-03-21T23:37:41Z

Thank you!

yhuai · 2016-03-21T23:43:13Z

Seems this one breaks scala 2.10 compilation (https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-sbt-scala-2.10/547/console). Can you fix it?

keypointt · 2016-03-21T23:46:02Z

Oh sure I'll try to fix it asap

keypointt · 2016-03-22T00:07:26Z

@yhuai I've just tested on my local machine, and there is no compile error and the example output results correctly.

Also it seems all the parameters are right according to method documentation for sampleByKey() and sampleByKeyExact() http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions

And the build before merging to master was also successful...I'm not quite sure why this master build failed...

mengxr · 2016-03-22T00:45:54Z

I reverted the change in master to unblock other builds. @keypointt Could you test it on Scala 2.10? It seems that Scala 2.10 doesn't resolve keyword arguments nicely:

def f(a: String, b: String, c: String = "xxx") = {
  println(a, b, c)
}

f(a = "abc", "def")
<console>:34: error: not enough arguments for method f: (a: String, b: String, c: String)Unit.
Unspecified value parameter b.
              f(a = "abc", "def")

We should either fix the seed or use keyword argument for fractions.

keypointt · 2016-03-22T00:59:39Z

Oh yes I just tried 2.10 and 2.11, where 2.11 build succeeded but 2.10 failed.

keypointt · 2016-03-22T01:22:58Z

@mengxr I've add keyword argument at my forked repo: keypointt@892fe60
And for demonstration purpose, should 'seed' be explicitly added? Or I should remove the val 'seed' in this commit?

should I open a separate PR? since new commit is not syncing to this PR

… include_example https://issues.apache.org/jira/browse/SPARK-13019 The example code in the user guide is embedded in the markdown and hence it is not easy to test. It would be nice to automatically test them. This JIRA is to discuss options to automate example code testing and see what we can do in Spark 1.6. Goal is to move actual example code to spark/examples and test compilation in Jenkins builds. Then in the markdown, we can reference part of the code to show in the user guide. This requires adding a Jekyll tag that is similar to https://github.com/jekyll/jekyll/blob/master/lib/jekyll/tags/include.rb, e.g., called include_example. `{% include_example scala/org/apache/spark/examples/mllib/SummaryStatisticsExample.scala %}` Jekyll will find `examples/src/main/scala/org/apache/spark/examples/mllib/SummaryStatisticsExample.scala` and pick code blocks marked "example" and replace code block in `{% highlight %}` in the markdown. See more sub-tasks in parent ticket: https://issues.apache.org/jira/browse/SPARK-11337 Author: Xin Ren <[email protected]> Closes apache#11108 from keypointt/SPARK-13019.

keypointt · 2016-03-22T18:52:05Z

Hi @mengxr , I'm using below command to compile this branch (after merge in latest master branch) by scala-2.10
build/sbt -Pyarn -Phadoop-2.3 -Pkinesis-asl -Pspark-ganglia-lgpl -Phive -Phive-thriftserver -Pscala-2.10 compile test:compile

and I'm always getting this error sbt.ResolveException: unresolved dependency: org.scala-lang#jline;2.11.7: not found where the jline version is set at

spark/pom.xml

Line 157 in 48978ab

<jline.version>${scala.version}</jline.version>

I'm assuming that master branch now is using scala-2.11 and when building 2.10 there could be some resolve issues?

mengxr · 2016-03-22T19:36:02Z

See https://github.com/apache/spark/blob/master/docs/building-spark.md#building-for-scala-210.
We don't need to specify seed. Just use keyword argument for fractions.
You can either re-open this PR or create a new one. Just need to make it clear what you changed on top of previously merged.

… mllib-statistics.md using include_example ## What changes were proposed in this pull request? This PR for ticket SPARK-13019 is based on previous PR(#11108). Since PR(#11108) is breaking scala-2.10 build, more work is needed to fix build errors. What I did new in this PR is adding keyword argument for 'fractions': ` val approxSample = data.sampleByKey(withReplacement = false, fractions = fractions)` ` val exactSample = data.sampleByKeyExact(withReplacement = false, fractions = fractions)` I reopened ticket on JIRA but sorry I don't know how to reopen a GitHub pull request, so I just submitting a new pull request. ## How was this patch tested? Manual build testing on local machine, build based on scala-2.10. Author: Xin Ren <[email protected]> Closes #11901 from keypointt/SPARK-13019.

keypointt added 15 commits February 1, 2016 17:51

[SPARK-13019] raplce for summary staticstics, scala code

49b7012

[SPARK-13019] test out on/off, for import part

83592bc

[SPARK-13019] create separate example files, but cannot compile yet

069341b

[SPARK-13019] move new files into mllib folder

2058b16

[SPARK-13019] remote python init files

b328542

[SPARK-13019] comment broken code to pass complie process

12fda2b

[SPARK-13019] remove code block tag

2abfaa9

[SPARK-13019] make commented code explicit in html content

157da53

[SPARK-13019] Stratified Sampling working

323304f

[SPARK-13019] hypothesis testing working

3692d30

[SPARK-13019] Hypothesis Testing Kolmogorov Smirnov Test Example is w…

89c3d2e

…orking

[SPARK-13019] remove empty lines

4dbbc6d

[SPARK-13019] random data generation example working

f024fc3

[SPARK-13019] Kernel Density Estimation Example is working

6f949cd

[SPARK-13019] code style check

a4dd0fb

keypointt added 2 commits February 12, 2016 08:29

[SPARK-13019] fix python style

3a11802

[SPARK-13019] remove setMaster, change java to 2-indent

0df3e65

[SPARK-13019] more java style fix

d817d0b

[SPARK-13019] mainly re-organize java import

f945222

[SPARK-13019] re-organize python import

aec10ca

yinxusen reviewed Mar 6, 2016
View reviewed changes

mengxr reviewed Mar 16, 2016
View reviewed changes

[SPARK-13019] use asList() for concise code

a4eb28d

asfgit closed this in 1af8de2 Mar 21, 2016

keypointt mentioned this pull request Mar 22, 2016

[SPARK-13019][Docs] fix for scala-2.10 build: Replace example code in mllib-statistics.md using include_example #11901

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-13019][Docs] Replace example code in mllib-statistics.md using include_example #11108

[SPARK-13019][Docs] Replace example code in mllib-statistics.md using include_example #11108

keypointt commented Feb 7, 2016

mengxr commented Feb 11, 2016

mengxr commented Feb 11, 2016

SparkQA commented Feb 11, 2016

SparkQA commented Feb 16, 2016

SparkQA commented Feb 20, 2016

SparkQA commented Feb 21, 2016

SparkQA commented Feb 23, 2016

yinxusen Mar 6, 2016

mengxr Mar 16, 2016

mengxr commented Mar 16, 2016

yinxusen commented Mar 17, 2016

mengxr commented Mar 17, 2016

yinxusen commented Mar 17, 2016

SparkQA commented Mar 17, 2016

keypointt commented Mar 21, 2016

mengxr commented Mar 21, 2016

keypointt commented Mar 21, 2016

yhuai commented Mar 21, 2016

keypointt commented Mar 21, 2016

keypointt commented Mar 22, 2016

mengxr commented Mar 22, 2016

keypointt commented Mar 22, 2016

keypointt commented Mar 22, 2016

keypointt commented Mar 22, 2016

mengxr commented Mar 22, 2016


		object SummaryStatisticsExample {

		def main(args: Array[String]) {

[SPARK-13019][Docs] Replace example code in mllib-statistics.md using include_example #11108

[SPARK-13019][Docs] Replace example code in mllib-statistics.md using include_example #11108

Conversation

keypointt commented Feb 7, 2016

mengxr commented Feb 11, 2016

mengxr commented Feb 11, 2016

SparkQA commented Feb 11, 2016

SparkQA commented Feb 16, 2016

SparkQA commented Feb 20, 2016

SparkQA commented Feb 21, 2016

SparkQA commented Feb 23, 2016

yinxusen Mar 6, 2016

Choose a reason for hiding this comment

mengxr Mar 16, 2016

Choose a reason for hiding this comment

mengxr commented Mar 16, 2016

yinxusen commented Mar 17, 2016

mengxr commented Mar 17, 2016

yinxusen commented Mar 17, 2016

SparkQA commented Mar 17, 2016

keypointt commented Mar 21, 2016

mengxr commented Mar 21, 2016

keypointt commented Mar 21, 2016

yhuai commented Mar 21, 2016

keypointt commented Mar 21, 2016

keypointt commented Mar 22, 2016

mengxr commented Mar 22, 2016

keypointt commented Mar 22, 2016

keypointt commented Mar 22, 2016

keypointt commented Mar 22, 2016

mengxr commented Mar 22, 2016